Binary tree-structured vector quantization approach to clustering and visualizing microarray data
نویسندگان
چکیده
MOTIVATION With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. RESULTS Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.
منابع مشابه
A genetic approach to the design of general-tree-structured vector quantizers for speech coding
The full-search vector quantization suffers from spending much time searching the whole codebook sequentially. Rcccntly, several tree-structured vector quantizers had been proposed. But almost all trees used arc binary trees and hence the training samples contained in each node are forced to be divided into two clusters artificially. We present a general-tree-structured vector quantizer that is...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملClustering methods for the analysis of DNA microarray data
It is now possible to simultaneously measure the expression of thousands of genes during cellular di erentiation and response, through the use of DNA microarrays. A major statistical task is to understand the structure in the data that arise from this technology. In this paper we review various methods of clustering, and illustrate how they can be used to arrange both the genes and cell lines f...
متن کاملClassification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach
Title of Dissertation: CLASSIFICATION AND COMPRESSION OF MULTIRESOLUTION VECTORS: A TREE STRUCTURED VECTOR QUANTIZER APPROACH Sudhir Varma, Doctor of Philosophy, 2002 Dissertation directed by: Professor John S. Baras Department of Electrical and Computer Engineering Tree structured classifiers and quantizers have been used with good success for problems ranging from successive refinement coding...
متن کاملTitle of Dissertation : CLASSIFICATION AND COMPRESSION OF MULTI - RESOLUTION VECTORS : A TREE STRUCTURED VECTOR QUANTIZER APPROACH
Title of Dissertation: CLASSIFICATION AND COMPRESSION OF MULTIRESOLUTION VECTORS: A TREE STRUCTURED VECTOR QUANTIZER APPROACH Sudhir Varma, Doctor of Philosophy, 2002 Dissertation directed by: Professor John S. Baras Department of Electrical and Computer Engineering Tree structured classifiers and quantizers have been used with good success for problems ranging from successive refinement coding...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 18 Suppl 1 شماره
صفحات -
تاریخ انتشار 2002